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Abstract — In this paper, it is proposed to design an area and speed efficient reversible fused 
Radix -2 FFT unit using 4:3 compressor. Radix-2 Reversible FFT unit requires 24-bit and 
48 - bit reversible adders, 24 - bit and 48 - bit reversible subtracters and 24x24 reversible 
multiplier units. In the proposed architecture, the 24-bit adder has been realized as a 
reversible carry-look-ahead adder using PRT-2 gate. The proposed reversible carry-look- 
ahead adder is efficient in terms of transistor count, critical path delay and garbage outputs. 
Reversible subtracter is realized using TR gate with less critical path delay. The 24x24 bit 
multiplication operation is fragmented to nine parallel reversible 8x8 bit multiplication 
modules. It is proposed to design a new reversible design of the 24x24 bit multiplier in which 
the partial products are added using reversible 4:3 compressors which were realized using 
PRT-2 gates. The proposed multiplier is optimized in terms of critical path delay and 
garbage outputs. This paper describes three reversible fused operations and applies them to 
the implementation of Fast Fourier Transform Processors. The fused operations are 
reversible add-subtract unit, reversible multiply-add unit and reversible multiply-subtract 
unit. Thus, Reversible Radix-2 FFT butterfly unit is implemented efficiently with the three 
fused operations. The fused reversible FFT unit using 4:3 compressor operates at a greater 
speed and consumes lesser amount of logic resource than the discrete implementation. 

Index Terms — Fused arithmetic operations, Fast Fourier Transform, Radix-2 FFT Butterfly 
Unit, Reversible 4:3 Compressor. 

L Introduction 

The Discrete Fourier Transform (DFT) is one of the most widely used digital signal processing(DSP) 
algorithms. DFTs are almost never computed directly, but instead are calculatedusing the Fast Fourier 
Transform (FFT), which comprises a collection of algorithms thatefficiently calculate the DFT of a sequence. 
The number of applications for FFTs continuesto grow and includes such diverse areas as: communications, 
signal processing, instrumentation,biomedical engineering, acoustics, numerical methods, and 
appliedmechanics.As semiconductor technologies move toward finer geometries, both the available 
performanceand the functionality per die increase. Unfortunately, the power consumption ofprocessors 
fabricated in advancing technologies also continues to grow. This power increasehas resulted in the current 
situation, in which potential FFT applications, formerly limitedby available performance, are now frequently 
limited by available power budgets. The recentdramatic increase in the number of portable and embedded 
applications has contributedsigniftcantly to this growing number of power-limited opportunities. The goal of 
this research is to develop the architecture, and circuits necessaryfor high-performance and energy-efficient 
FFT processors — with an emphasis on a VLSIimplementation. Thus, a new methodology has to be evolved 
for to implement today's complex system with less consumption of power. According to R.Landauer's 
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research in the early 1960s, every lost bit causes an information loss. His research shows that use of 
conventional irreversible logic gates leads to power dissipation. The amount of energy dissipated for every 
irreversible bit operation is given by kT ln2, where T is the absolute temperature at which the computation is 
performed, and k is Boltzmann's constant [1]. Bennett showed that in order to avoid energy loss it is 
necessary that all the computations have to be performed in a reversible way [2], The total energy dissipation 
is due to dynamic switching, static loss associated with non-ideal switches and due to information loss. 
Power consumption of dynamic components can be expressed as PaCV 2 f where 'C is the 

capacitance, 'V is the voltage swing and 'f is the clock frequency. The dynamic power dissipation can be 
reduced by decreasing the switching capacitance, the power supply or the operating frequency. To minimize 
the power dissipation due to static power, transistors should be operated at a lower threshold voltage which 
will minimize the leakage current. Finally, to reduce the power dissipation due to information loss, circuits 
must be constructed from reversible logic gates. Conventional logic circuits are not reversible. A circuit is 
said to be reversible if there is a one - to -one correspondence between its input and output assignment. A 
reversible circuit maps each input vector, into a unique output vector and vice versa. Reversible logic has 
received significant attention in recent years. It has applications in various research areas such as low power 
CMOS design, optical computing, quantum computing, bioinformatics, thermodynamic technology, DNA 
computing and nanotechnology [3]. Synthesis of reversible logic circuits is significantly more complicated 
than traditional irreversible logic circuits because in a reversible logic circuit, we are not allowed to use fan- 
out and feedback [4]. A reversible logic circuit should have the following features [5]: 
Use minimum number of reversible gates. 
Use minimum number of garbage outputs. 
Use minimum constant inputs. 

The output which cannot be used further for computation process is known as garbage output. The input that 
is added to an nxk function to make it reversible is called constant input [6]. The quantum cost of a reversible 
or quantum circuit is defined as the number of lxl or 2x2 gates used to implement the circuit. The major 
objective of a reversible logic design is to minimize the quantum cost and the number of garbage outputs [7]. 
Hence, one of the major issues in reversible circuit design is garbage minimization. Another significant 
criterion in designing a reversible logic circuit is to minimize the number of reversible gates used [8]. 
In this paper, it is proposed to design a reversible fused and discrete Radix-2 butterfly FFT unit using 4:3 
compressors. In this work, it is proposed to design a 4-bit reversible carry-look-ahead adder using PRT-2 
gates. Thus, 24-bit and 48-bit reversible carry-look- ahead adder has been realized using the proposed 4-bit 
reversible carry-look-ahead adder. Further, for subtraction, a 24-bit and a 48 - bit reversiblesubtractor is 
designed using TR gates with less critical path delay. A 24x24 bit reversible multiplier is realized using nine 
8x8 multipliers. Further, to add the partial products in a multiplier, 4:3 compressors are used. A reversible 4:3 
compressor is designed using PRT-2 gates as it has less critical path delay. All the design units have been 
compared in terms of garbage outputs, critical path delay and the number of reversible gates required for 
implementing the proposed design. The paper is organized into the following sections. Section two is an 
overview of reversible logic gates. Section 3 describes the reversible representation of the Radix-2 FFT 
Butterfly Unit as a fused and as a discrete element. Section 4 deals with the survey of the existing work. 
Section 5 describes the proposed design of 4-bit, 24-bit and 48-bit reversible carry-look-ahead adder. Section 
6 represents the proposed design of the reversible 24-bit subtractor. Section 7 describes the proposed design 
of the reversible 24x24 bit multiplier using 4:3 Compressor. Section 8 representsthe proposed design of the 
fused reversible radix-2 FFT butterfly unit. Result analysis are contained in section 9 and conclusions are 
contained in section 10. 

II. Reversible LOGIC GATES 

There are a number of commonly used reversible logic gate such as Feynman gate, Toffoli gate, Fredkin gate, 

Peres gate, New Gate, Feynman Double gate and PRT gates. The important reversible gates used in this work 

are the Feynman gate, Fredkin gate, Toffoli gate, and PRT gates. Table I shows the symbolic representation 

of the reversible Feynman gate, Toffoli, TR, Fredkin gate, PRT-2 gate and PRT-1 gate. 

Two of the universal 3x3 reversible gates that have received much attention are Fredkin gate and Toffoli 

gate. 
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Table I. Reversible Logic Gates 
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III. Representation Of The Radix-2 Fft Butterfly Unit As A Discrete And As A Fused Element 
A. Need for Compressors 

For high order multiplication, high order compressors are used to compress the bits [9]. The 4:2compressor 
have been widely employed in the high speed multipliers to lower the latency of the partial product 
accumulation stage. Owing to its regular interconnection, the 4:2 compressor is ideal for the construction of 
regularly structured Wallace tree with low complexity. But the accuracy obtained with the reversible 4:2 
compressor was 80%. Hence, in this paper, a new reversible 4:3compressor has been proposed and the same 
is used in the realization of the reversible 24x24 bit multiplier. 

The need for fused element is to increase the speed of signal processing applications. Thus, three reversible 
fused elements are introduced in the proposed work and the same has been used in the realization of Radix-2 
FFT Unit. In the proposed work, reversible fused add-sub (AS), reversible fused multiply-add (FMA) and 
reversible fused multiply-sub (FMS) operations are implemented using the existing reversible gates. Thus, an 
efficient reversible fused radix-2 FFT unit has been implemented using the above three fused elements. 
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B. Design of a Reversible Radix-2 FFT Butterfly Unit as a Discrete and as a Fused Element 

To reveal the importance of the Fused FMA, Fused FMS and Fused AS units for FFT realization, reversible 
Radix-2 FFT butterfly unit has been implemented using both the discrete and the fused units. In the proposed 
work, all lines are 24-bits wide and all operations are complex. The complex add, subtract, and multiply 
operations shown in figure 1 can be realized with a discrete implementation that uses three reversible real 
adders to perform the complex add and four reversible real multipliers and three real reversible subtractors to 
perform the complex subtract. 



xir > 


Reversible 






xzr > 


24-bit C LA 





Reversible 
24-bit CLA 



■ 21 



Reversible 24- 
bit Subtracter 



Reversible 
24x24 bit 
Multiplii'i 



Reversible 24- 
bit Su h tractor 



Reversible 
24x24 bit 
Multiplier 



Ri'vi'i viblf 
24x24 bit 
Multiplier 



Reversible 43- 
bit Subtracter 



Reversible 
24x24 bit 
Multiplier 



Reversible 48- 
bit C LA 



Fig - 1. Discrete Implementation of the Reversible Radix-2 FFT Butterfly Unit 

Thus the discrete implementation of the Reversible Radix-2 FFT Butterfly unit requires two 24-bit reversible 
Carry Look-Ahead Adder, two 24-bit reversible subtractor, four reversible 24x24 bit multiplier using 4:3 
compressor, a 48-bit reversible Carry Look- Ahead Adder and a 48-bit Reversible Subtractor. 
Alternatively, as shown in figure 2, the complex add and subtractcan be performed with two reversible fused 
add-subtract units (marked as FAS in figure 2) and the complex multiplication can be realized with two 
reversible fused product units (marked as FMA and FMS). 
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Fig - 2. Fused Implementation of the Reversible Radix-2 FFT Butterfly Unit 

In the above figure, Reversible 24-bit fused add-subtract unit consists of one reversible 24-bit carry-look- 
ahead adder and one reversible 24-bit subtractor. Reversible 48-bit fused multiply-add (FMA) unit consists of 
two reversible 24x24 bit multipliers and one 48-bit reversible carry-look-ahead adder. Reversible 48-bit fused 
multiply-subtract (FMS) unit consists of two reversible 24x24 bit multipliers and one 48-bit reversible 
subtractor. 



IV. Literature Survey 

Rashmi S. B., Tilak B. G., and Praveen B [10] came up with good design for half adder using PRT-2 
gate. Realization of the efficient reversible half adder requires one PRT - 2 gate whose critical path delay is 2. 
HimanshuThapliyal and Hamid N. Arabnia[l 1] proposed a 4-bit carry-look-ahead adder using Fredkin gate, 
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TSG gate, Feynman gate and R2 gate. Reversible Carry-look-ahead adder consists of a reversible Carry 
Propagate and Carry Generate adder (PGA). The reversible PGA consists of a fredkin gate and TSG gate. 
The critical path delay of a single Fredkin gate is 6 and the critical path delay of the TSG gate is 7. Thus the 
net critical path delay of a reversible PGA is 13. The number of garbage outputs produced by this circuit is 
4. The 4-bit reversible carry-look-ahead adder consists of 3 PGAs, 2 TSG gates, 4 Feynman gates, 6 Fredkin 
gates and 1 R2 gate. Thus, the critical path delay of a 4-bit carry look-ahead adder is 93. The number of 
garbage outputs produced by this circuit is 17. The number of transistors required to implement the proposed 
circuit is 74. 

Several works were carried out on reversible multipliers. A. Banerjee and A. Pathak [12], proposed a design 
for a 4x4 bit multiplier circuit using HNG, TSG and MKG gates. The quantum cost of a single HNG gate, 
TSG gate and MKG gates are 6, 10 and 10. The number of reversible gates required is 56 and the number of 
garbage outputs produced by this circuit is 26. The circuit is not minimized in terms of quantum cost. 
Masoumeh Shams, MajidHaghparast andKeivanNavi [13] proposed a reversible 4x4 bit multiplier circuit 
using Peres gate and MKG gate. The number of reversible gates required is 52. The number of garbage 
outputs produced by this circuit is 56. Quantum delay of MKG gate is unknown as the critical path of the 
gate is not available. H. R. Bhagyalakshmi and M. K. Venkatesha [14] proposed another reversible 4x4 
multiplier circuit using DPG gate and BVF gate. In this circuit, partial products are generated using Peres 
gate. The total quantum cost of this circuit is 64. The number of gates required is 40. The number of garbage 
outputs produced by this circuit is 52. Quantum delay of BVF gate is unkown. HimanshuThapliyal and M. 
B. Srinivas [15] proposed a design for a reversible 4x4 multiplier circuit using TSG gates. In this circuit, 
Fredkin gates are employed to generate the partial products. The number of reversible gates required to 
realize this circuit is 29. The number of garbage outputs produced by this circuit is 58. The quantum delay of 
the circuit increases since quantum delay of a single Fredkin gate is 5 and in total 15 Fredkin gates are 
employed only to generate the partial products. MajidHaghparast, SomayyehJaffaraliJassbi, KeivanNavi and 
OmidHashemipour [16] proposed another 4x4 reversible multiplier circuit using HNG gates. In this circuit, 
partial products are generated using Peres gate. The number of reversible gates required is 28. The number of 
garbage outputs produced by this circuit is 52. Quantum delay of HNG gate is also unknown. M. S. Islam, M. 
M. Rahman, Z. Begum and M. Z. Hafiz [17] proposed a 4x4 reversible multiplier circuit using PFAG gates 
wherein the partial products are generated by using Peres gate. The number of gates is 28 and the number of 
garbage outputs produced is 52 which are same as in [16]. Quantum delay of this gate is unknown. 
HimanshuThapliyal and M. B. Srinivas [18] proposed a design for an 8x8 bit multiplier circuit using TSG 
gates. The partial products are generated by using Fredkin gates in parallel. The number of reversible gates 
required is 108. The drawback of this work is that the quantum delay of a single Fredkin gate is 5. Also the 
circuit requires a considerable amount of resources. Michael Nachtigal, HimanshuThapliyal and 
NagarajanRanganathan [19] proposed a design for an 8x8 bit multiplier in which the partial products are 
generated bycascading Toffoli and Peres gate. The number of gates is 653 and the number of garbage outputs 
produced by this circuit is 129. The quantum delay of the circuit is 447. The drawback of this work is that it 
produces more number of garbage outputs as well as it increases the quantum delay of the circuit. 
Few works were reported on reversible half subtracters. K. V. R. M. Murali, N. Sinha, T. S. Mahesh, M. H. 
Levitt, K. V. Ramanathan, and A. Kumar [20] proposed a design for a reversible half subtractor. The 
drawback of this work is that the critical path delay of a single reversible gate is 4. H. Thapliyal and N. 
Ranganathan [21] proposed a design for reversible binary subtractors using new reversible TR gate. The 
number of reversible gates required is one and the critical path delay associated with this circuit is 4. 
Thus, the aim is to bring down the number of garbage outputs, critical path delay and transistor count with 
less number of constant inputs. 

V. Proposed Design Of A Reversible 24 Bit And 48-Bitcarry-Look-Ahead Adder 

A. Design of the Proposed Reversible Carry-Propagate and Carry-Generate Adder (PGA) 
In order to design the reversible carry look-ahead adder, a novel 4 bit reversible carry look-ahead adder 
called NRCLA is proposed using PRT - 2 gates which forms the basic building block of the proposed 
reversible carry look-ahead adder. Since PRT- 2 gate has less critical path delay we have designed a 
reversible Carry Propagate and Carry Generate adder using a single PRT - 2 gate. Further, to minimize the 
number of garbage outputs and the critical path delay of a 4-bit reversible carry-look-ahead adder, it is 
proposed to design the reversible carry-look -ahead adder using PRT-2 gate. 
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In order to propose the design for a 24-bit carry look-ahead adder; reversible design of the NRCLA is 
required. 

In carry look-ahead adder, we consider two new binary variables: 

P i =A,®B 1 
Gi=AiBi 

Then, the output sum and carry of full adder can be rewritten in terms of Pi and Qas follows: 

s i =A i 0B i eCi=PieCi 

C 1+ i=A 1 B 1 +(A 1 ^ 1 )C,=A 1 B 1 +(A 1 CT 1 )C 1 
=G,+PiCi 

The 4 bit reversible Carry look-ahead adder is created by expanding the above equations. 
In the proposed 4-bit reversible carry look-ahead adder (NRCLA), the Pi, Gi and Si are generated using 
reversible PRT-2 gate[10]. The block generating Pi, Gi and Si is called PGA (Carry Propagate & Carry 
Generate Adder). 

The structure of Reversible PGA is shown in figure 3. The number of garbage outputs produced by this 
circuit is 2 and the critical path delay of the proposed circuit is 4. 
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Fig - 3. Reversible Propagate Generate Adder (PGA) 

The transistor representation of Reversible Propagate Generate Adder (PGA) is shown in figure 4. 
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Fig - 4. Transistor representation of Reversible Propagate Generate Adder (PGA) 

In the above figure Qj represents Pi, Pi represents Gi, P represents Q+i, Q represents Sj. 
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A comparison of the proposed reversible propagate generate adder with the existing work is shown in table 
II. 

Table II. Comparison Of The Proposed Reversible Propagate Generate Adder 





Number of Reversible 


Number of garbage 


Critical Path delay 




gates 


outputs 




HimanshuThapliyal, 
Hamid.N.Arabnia [11] 


2 


4 


13 


Proposed Design 


2 


2 


4 


Improvement 




50% 


69% 



It is seen that the number of garbage outputs reduced by 2 thus, yielding an improvement of 50%. The critical 
path delay of the circuit is 13 in the existing work and 4 in the case of proposed design i.e., the improvement 
is 69% in theproposed design compared to the existing work. 

B. Design of the Proposed Reversible 4-Bit Carry-Look-Ahead Adder 

The complete structure of the reversible 4-bit carry look-ahead adder (NRCLA) using PRT-2 gates is shown 
in figure 5. 
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Fig - 5. Reversible 4-Bit Cany LookAhead Adder 

In the complete design of NRCLA in figure 5 appropriate gates are used wherever required for generating the 
function with minimum number of reversible gates and garbage outputs. The unused output in figure 5 
represents the garbage outputs. The choice of the optimal gates at the appropriate places has made the design 
highly optimal in terms of number of reversible gates, garbage outputs and critical path delay. The number 
of reversible gates required is 8. The proposed NRCLA produces 8 garbage outputs. The critical path delay 
of the proposed NRCLA is 16. The number of transistors required to implement the proposed architecture is 
56. A comparison of the proposed 4-bit reversible carry-look-ahead adder (NRCLA) with the existing work is 
shown in table III. 



Table III. Comparison Of The Proposed 4-Bit Reversible Carry-Look- Ahead- Adder (Nrcla) 





Number of 
Reversible 
gates 


Number of 

garbage 

outputs 


Number of 
transistors 
required 


Critical 

Path 

delay 


Hi ma n shuTh apl i 

yal, 
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a [11] 


16 


17 


74 


93 


Proposed Design 
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56 


16 


Improvement 


50% 


53% 


24% 


83% 



It is inferred that the proposed design has better performance compared to the existing work in terms of the 
number of reversible gates required, garbage outputs, the number of transistors required and critical path 
delay. 
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C. Design of the Proposed Reversible 24-Bit Carry-Look-Ahead Adder 

A 24-bit reversible carry-look-ahead adder is implemented by cascading six stages of the proposed 4-bit 
reversible carry-look-ahead adder (NRCLA). Table IV shows the number of gates required to implement the 
proposed 24-bit reversible carry-look-ahead adder. 

Table IV. Number Of Gates Required To Implement The Proposed 24-Bit Reversible Carry-Look- Ahead Adder 





Number of 
Reversible 
gates 


Number 
of 

garbage 
outputs 


Number of 
transistors 
required 


Critical 

Path 

delay 


Proposed 
Design 


48 


48 


336 


96 



As seen from the table, the number of reversible gates required to implement the proposed design is 48. The 
number of garbage outputs produced by this circuit is 48, the number of transistors required to implement the 
proposed design is 336 and Critical path delay of the circuit is 96. Since the proposed 4-bit reversible carry- 
look-ahead adder (NRCLA) has less critical path delay when compared to the work in Ref [11] , the proposed 
24-bit reversible carry-look-ahead adder will significantly increase the speed of the circuit. 

D. Design of the Proposed Reversible 48-Bit Carry-Look-Ahead Adder 

A 48-bit reversible carry-look-ahead adder (NRCLA) is implemented by cascading twelve stages of the 
proposed 4-bit reversible carry-look-ahead adder. Table V shows the number of gates required to implement 
the proposed 48-bit reversible carry-look-ahead adder. 



Table V. Number Of Gates Required To Implement The Proposed 48-Bit Reversible Carry-Look- Ahead Adder 





Number of 
Reversible 
gates 


Number 
of garbage 
outputs 


Number of 
transistors 
required 


Critical 

Path 

delay 


Proposed 
Design 


96 


96 


672 


192 



As seen from the table, the number of reversible gates required to implement the proposed design is 96. The 
number of garbage outputs produced by this circuit is 96, the number of transistors required to implement the 
proposed design is 672 and Critical path delay of the circuit is 192. Since the proposed 4-bit reversible carry- 
look-ahead adder has less critical path delay when compared to the work in Ref [11], the proposed 48-bit 
reversible carry-look-ahead adder will also significantly increase the speed of the circuit. 



VI. Design Of The Proposed Reversible 24-Bit Subtractor 



Realization of an efficient half subtractor using TR gates with critical path delay of 3 has been proposed in 
Thapliyal and Ranganathan[2011]. In this work, it is proposed to design a reversible 24-bit subtractor using 
the reversible full subtracters and the reversible half subtactor with TR gates. The symbolic representation of 
a reversible half subtractor and the reversible full subtractor using TR gates is shown in figures 6 & 7. 
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Fig - 6. Symbolic Representation of TR Gate as a half subtractor 
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Fig - 7. Symbolic Representation of TR Gate as a full subtractor 

In this work, the reversible 24-bit subtractor is designed by cascading one reversible half subtractor and 
twenty three reversible full subtracters. 

The critical path delay of the reversible full subtractor circuit constructed using two TR gates is 6. 
Table VI shows the number of gates required to implement the proposed 24-bit reversible subtractor. 



Table VI. Number Of Gates Required To Implement The Proposed 24-Bit Reversible Subtractor 





Number of Reversible 
gates 


Number of garbage 
outputs 


Critical Path delay 


Proposed Design 


47 


47 


141 



Thus, the number of reversible gates required to implement the proposed design is 47. The number of 
garbage outputs produced by this circuit is 47. Critical path delay of the circuit is 141. 



VII. Proposed Design Of A Reversible 24x24 Bit Multiplier 
A. Wallace Tree Multipliers and 4:3 Compressor Design 

In high-speed designs, the Wallace tree construction method is usually used to add the partial products in a 
tree-like fashion in order to produce two rows of partial products that can be summed up in the last stage. 
Hence, in this paper, a new reversible 4:3 compressor has been proposed and the same is used in the 
realization of the reversible 24x24 bit multiplier. 

Thus, based on the property of a counter a reversible 4:3 compressor is realized where A, B, C & D are the 
inputs of a counter and the three outputs are X , 7 and Z among whichX is the LSB and Z is the MSB. Input 
combinations and the corresponding decimal count of a full adder based on 4:3 Compressorare shown in 
Table VII. 



Table VII. Representation Of Full Adder As ACounter Based On4:3 Compressor 
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Any three inputs are one 
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3 


All the inputs are one 


1 








4 



The reversible 4:3 compressor is constructed using one reversible full adder and two reversible half adders as 
shown in figure 8. 
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Fig - 8. Adder as a 4:3 Compressor 
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The accuracy obtained through reversible 4:3 compressor was 100%. Since PRT- 2 gate has less critical path 
delay compared to the Peres gate, we have designed an 8x8 bit multiplier using Toffoli and PRT - 2 gates. 
Toffoli gates are used to generate the partial products and PRT-2 gates are used to add the partial products. 
Using the designed 8x8 bit multiplier, 24x24 bit multiplier has been designed and implemented. 

A. Symbolic Representation of a Reversible 4:3 Compressor Using Prt-2 Gate 

We have implemented a reversible 4:3 compressor using four reversible PRT -2 gates as shown in figure 9. 
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Fig - 9. Reversible 4:3 compressor using PRT-2 gates 

Thus the reversible 4:3 compressor uses 4 primary inputs and three constant inputs. It produces three primary 
outputs and 4 garbage outputs. The signals Rl, R2, R3, and R represent the garbage outputs, Q3 represents 
the sum and Q, P represents the carry out. The critical path delay of the circuit is 8. 

B. Proposed Design of 24x24 Bit Multiplier 

This unit is responsible for mantissa multiplication. The multiplication is done on 24 bit. To design the 24x24 
bit multiplier, the operands are decomposed into three partitions of 8 bits each as shown in figure 10. This 
approach called as operand decomposition method, is mainly used to reduce switching activity in binary 
multipliers. 

A« A« Ah B*j B L 
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Fig - 10. Reversible 24x24 Bit Multiplier Using Reversible 8x8 Bit Multiplier 

Thus, the 24x24 bit multiplication is performed through nine 8x8 bit multipliers. Since the 24x24 bit 
multiplier has to be implemented using 8x8 bit multipliers, the 8x8 bit multiplier should be designed so as to 
be efficient in terms of the garbage output and critical path delay. Wallace tree multiplier architecture is used 
as it has a high speed and has been proved to be efficient [22]. 

Wallace tree multiplication has three stages: Partial product generation, half adders, full adders and then the 
final addition stage to generate the product. Hence to design an 8x8 bit multiplier, the first step is to generate 
all 64 one bit partial products in a reversible manner. Partial products are made by ANDing the inputs 
together and passing them to the appropriate adder. The partial products are generated by using Toffoli and 
PRT-2 gates, because they provide the necessary AND operator when their input C is hardwired to 0. Table 
VIII shows the comparison of the proposed partial product generation unit with the work in Ref [19] in terms 
of number of reversible gates, garbage outputs and critical path delay. 

It is seen that the proposed design using PRT-2 gates instead of Peres gate has better performance compared 
to the existing work. Critical path delay of the proposed design and existing work [19] are 143 and 128 
respectively, indicating an improvement of the proposed design of 10%. 

Once the partial products have been generated, the final design of an 8x8 bit Wallace tree multiplier uses 22 
reversible half adders (RHA), 33 reversible full adders (RFA), and 14 Reversible 4:3 Compressors to produce 
the final product. The number of reversible gates required, garbage outputs produced and the critical path 
delay of the reversible 8x8 multiplier using 4:3 compressor is shown in table IX. 

It is inferred that the number of garbage outputs produced by the proposed design is 144 and it has a critical 
path delay of 302. The number of reversible gates required can be reduced by implementing the proposed 
design using 4:2 compressor which is 80%. Thus, if accuracy is to be met then the compromise is to increase 
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Table VIII. Comparison Of 8x8 Reversible Partial Product Generation Unit 





NO. OF 


GARBAGE 


CRITICAL 




REVERSIBLE 


OUTPUTS 


PATH 




GATES 




DELAY 


M. Nachtigal, H. 


64 


16 


143 


Thapliyal and N. 
Ranganathan 








[25] 








Proposed Design 


64 


8 


128 


Improvement 




50% 


10% 



Table IX. Number Of Gates Required To Implement The Reversible 8x8 Multiplier Using 4:3 Compressor 





NO. 
OF 

ADDE 
RS 


NO. OF 
REVERSIBL 
E GATES 


GARBAGE 
OUTPUTS 


CRITICAL 

PATH 

DELAY 


Proposed 
Design Using 
4:3 

Compressor 


69 


144 


144 


288 



the number of reversible gates which will obviously increase the number of garbage outputs and hence the 
critical path delay. As we have considered accuracy as at most important, we have implemented the design 
using 4:3 compressor. 

In the proposed design of the floating point multiplier, the 24-bit operands A and B are decomposed into 
three partitions of 8 bits each. Thus the 24x24 bit multiplier is performed through nine of our optimized 
reversible 8x8 bit multiplier. Table X shows the number of reversible gates required, garbage outputs 
produced and the critical path delay of the reversible 24x24 multiplier using 4:3 compressor. 

Table X. Number Of Gates Required To Implement The Reversible 24x24 BitMultiplier Using 4:3 Compressor 





NO. OF 
REVERSIBL 
E GATES 


GARBAGE 
OUTPUTS 


CRITICAL 

PATH 

DELAY 


Proposed Design 


127 


321 


642 



VIII. Proposed Design Of The Fused Reversible Radix-2 Fft Unit 

A comparison of the reversible fused add-sub unit with the discrete version of the reversible adder and the 
reversible subtractor in terms of the speed is shown in table XI. 



Table XI. Comparison Of The Reversible Fused Add-Sub Unit With Discrete 





DISCRETE ADDER 
AND SUBTRACTOR 


FUSED ADD-SUB 
UNIT 


Speed (Based on 
Reversible Logic) 


45.114ns 


10.966ns 



It is observed that the proposed reversible fused add-sub unit operates at a much greater speed i.e., an 
improvement of 76% than the discrete version of the reversible adder and reversible subtractor. 
A comparison of the reversible fused multiply-add unit with the discrete version of the reversible multiplier 
and the reversible adder in terms of the speed is shown in table XII. 

Table XII. Comparison Of The Reversible Fused Multiply- Add Unit With Discrete 





DISCRETE 
MULTIPLIER AND 
ADDER 


FUSED MULTIPLY- 
ADD UNIT 


Speed (Based on 
Reversible Logic) 


70.293ns 


29.402 ns 



It is seen that the proposed reversible fused multiply- add unit operates at a much greater speed i.e., an 
improvement of 58% than the discrete version of the reversible multiplier and reversible adder. 
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A comparison of the reversible fused multiply-sub unit with the discrete version of the reversible multilplier 
and the reversible subtractor in terms of the speed is shown in table XIII. 

Table XIII. Comparison Of The Reversible Fused Multiply-Sub Unit With Discrete Version 





discrete 
reversible 
multiplier AND 
reversible 
subtractor 


FUSED 
REVERSIBLE 
MULTIPLY-SUB 
UNIT 


Speed (Based on 
Reversible Logic) 


67.163ns 


19.409ns 



It is inferred that the proposed reversible fused multiply- sub unit operates at a much greater speed i.e., an 
improvement of 71% than the discrete version of the reversible multiplier and reversible subtractor. 
A comparison of the reversible fused Radix-2 FFT butterfly unit with the discrete version of the reversible 
radix-2 FFT butterfly unit in terms of the speed is shown in Table XIV. 

Table XIV. Comparison Of The Reversible Fused Radix-2 Fft Butterfly Unit With Discrete Version 





DISCRETE RADIX-2 
FFT BUTTERFLY UNIT 


FUSED RADIX-2 FFT 
BUTTERFLY UNIT 


Speed (Proposed 
Design using 
Reversible 
Logic) 


34.892ns 


32.636 ns 



It is inferred that the proposed reversible fused radix-2 fft butterfly unit operates at a comparatively higher 
speed i.e., an improvement of 7% than the discrete version of the reversible radix-2 fft butterfly unit. 

IX. Synthesis Report And Simulation Results 

The entire unit was functionally verified. The entire unit was synthesized using Xilinx Virtex XC5VLx50t- 
3ffl 136 as the target device. The synthesis report of all the modules is as follows : 

Table XV shows the comparison of the cell usage of the reversible fused Radix-2 FFT butterfly unit with the 
discrete version of the reversible Radix-2 FFT butterfly. 

Table XV. Synthesis Report Of The Reversible Fused Radix-2 Fft Butterfly Unit With Discrete Version 





DISCRETE RADIX-2 
FFT BUTTERFLY 
UNIT 


FUSED RADIX-2 FFT 
BUTTERFLY UNIT 


Proposed work 
( Reversible 
logic) 


No.of 
Slice 
LUTs 


No.of 

Bonded 

IOBs 


No.of 
Slice 
LUTs 


No.of Bonded 
IOBs 


6304 


293 


3320 


293 



The total number of logic resources required to implement the reversible fused radix-2 fft butterfly unit 
version is less when compared with the discrete version of the same i.e an improvement of 47%. The total 
cost of the reversible fused Radix-2 FFT Butterfly unit is shown in Table XVI. 

All the modules such as Reversible Carry-Propagate and Carry-generate adder(PGA), Reversible 4-bit Carry- 
look-ahead adder, Reversible 24-bit carry-look-ahead adder, Reversible 48-bit carry-look-adder, Reversible 
24-bit subtractor, Reversible 48-bit subtractor, Reversible 24x24 bit multiplier, Reversible fused add-sub 
(AS) unit, Reversible fused multiply-add (FMA)unit, Reversible fused multiply-subtractor(FMS) unit, 
Discrete reversible radix-2 fft butterfly unit and Reversible fused radix-2 fft butterfly unit were implemented 
using structural style of modeling. VHDL 

is used to describe the functionality of all the modules. All the units were functionally verified using 
Modelsim 10. 1 simulator. Figures 1 1 and 12 represent the simulation result of Discrete and Fused Reversible 
Radix-2 FFT Unit respectively. 
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Fig - 1 1. Discrete Reversible Radix-2 FFT Butterfly Unit 
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Fig - 12. Fused Reversible Radix-2 FFT Butterfly Unit 



X. Conclusions 

This paper describes the design of three reversible fused arithmetic units and their application to the 
implementation of FFT butterfly operations. Although the reversible fused add-subtract unit is specific to 
FFT applications, the reversible fused multiply-add unit and the reversible fused multiply-sub unit is 
applicable to a wide variety of digital signal processing algorithms. All the three reversible fused arithmetic 
units such as reversible fused add-sub unit, reversible fused multiply-add unit, fused multiply-sub unit are 
faster thanparallel implementations constructed with discrete. The proposed reversible fused radix-2 fft 
butterfly unit operates at a slightly greater speed i.e., an improvement of 7% and also consumes less number 
of resources than the discrete version of the reversible radix-2 fft butterfly unit i.e an improvement of 47%. 
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